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Abstract: Normal mean-variance mixture distributions are widely applied to simplify 
a model’s implementation and improve their computational efficiency under the Maximum 
Likelihood (ML) approach. Especially for distributions with normal mean-variance mix¬ 
tures representation such as the multivariate skewed variance gamma (MSVG) distribu¬ 
tion, it utilises the expectation-conditional-maximisation (ECM) algorithm to iteratively 
obtain the ML estimates. To facilitate application to financial time series, the mean is 
further extended to include autoregressive terms. Techniques are proposed to deal with 
the unbounded density for small shape parameter and to speed up the convergence. Sim¬ 
ulation studies are conducted to demonstrate the applicability of this model and examine 
estimation properties. Finally, the MSVG model is applied to analyse the returns of five 
daily closing price market indices and standard errors for the estimated parameters are 
computed using Louis’s method. 

Keywords: EM algorithm, maximum likelihood estimation, multivariate skewed vari¬ 
ance gamma distribution, normal mean-variance representation, unbounded likelihood. 


1 Introduction 


Variance Gamma (VG) distribution has been widely used to model financial time series 
data, particularly the log-price increment (return) which has the characteristics of having 
high concentration of data points around the center with occasional extreme values. Some 
key properties includes finite moments of all orders and a member of the family of ellipti¬ 


cal distributions in the symmetric case. See Madan and Seneta (1990) for more details of 


the properties of the VG distribution and its financial applications. However, its applica¬ 
bility is still limited by its rather complicated density. To estimate the model parameters 
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Madan and Seneta (1987) applied the ML approach using characteristic function from 


a Chebyshev polynomial expansion. Madan and Seneta (1990) and |Tjetjep and Seneta 


(2006) used the method of moments to estimate the parameters for the VG and skewed 


VG models respectively. Finlay and Seneta (2008) compared the performance of various 


estimation methods for data with long range of dependence, namely the methods of mo¬ 
ments, product-density maximum likelihood, minimum y 2 , and the empirical characteris¬ 
tic function. They showed that the performance of product-density maximum likelihood 
estimator is better than the method of moments and minimum y 2 estimators in terms 


of goodness-of-fit. McNeil et al. (2005) (§3.2.4) applied the Expectation-Maximisation 


(EM) algorithm to generalised hyperbolic (GH) distribution which nests VG distribution 
as a special case. However, they did not address the main challenge in estimation when 
the density is unbounded at the center, a feature of some financial time series. Fung 


and Seneta (2010) adopted a Bayesian approach with diffuse priors as a proxy for ML 


approach to estimate the bivariate skewed VG and Student-t models. They implemented 
the models using the Bayesian software WinBUGS, and compared their goodness-of-fit per¬ 
formances with simulated and real data sets. We propose to use EM algorithm under the 
ML approach to estimate parameters of the VG model directly. Our proposed method 
not only provides improved accuracy but also handles the problem of unbounded density. 

It is well known that the VG distribution has normal mean-variance mixtures rep¬ 
resentation in which the data follow a normal distribution condition on some mixing 
parameters which have a gamma distribution. Large values of the mixing parameters 
correspond to the normal distributions having relatively larger variances to accommo¬ 
date outliers. Hence outlier diagnosis can be performed using these mixing parameters 


(Choy and Smith. 1997). More importantly, the conditional normal data distribution 


greatly simplifies the parameter estimation based on standard results. However, the mix¬ 
ing parameters are not observed. In case with missing data, ML estimation becomes 
very challenging as the marginal likelihood function involves high dimensional integra¬ 
tion. Dempster et al. (1977) showed that the EM algorithm can be used to find the ML 


estimates for univariate Student-t distribution with fixed degrees of freedom in normal 
mean-variance mixture representation. Dempster, Laird and Rubin (1980) extended these 


results to the regression case. Meng and Rubin (1993) improved the EM algorithm to 


Expectation/Conditional Maximisation (ECM) algorithm which simplifies the maximisa¬ 
tion step by utilising some standard results of normal distribution when conditioned on 
some mixing parameters. Moreover, the ECM algorithm still possesses desirable proper¬ 


ties from the EM algorithm. Meng (1994) considered a variation of the ECM algorithm 


called multicycle ECM (MCECM) which inserts extra E-steps after each update of param- 
Liu and Rubin (|1994) advanced the ECM algorithm to Expectation/Conditional 


eters. 


Maximisation Either (ECME) by maximising the actual likelihood rather than the con¬ 


ditional/constrained expected likelihood. Liu and Rubin (1995) applied the MCECM 
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and ECME algorithms to obtain the ML estimates for multivariate Student-t distribu¬ 
tion with incomplete data. They also found that the ECME algorithm converges much 


more efficiently than the EM and ECM algorithms in terms of computational time. Hu 


and Kercheval (2008) used the MCECM algorithm with the Student-t distribution for 


portfolio credit risk measurement. 


We adopt the ECM algorithms for VG distribution and extend it to multivariate 
skewed VG (MSVG) distribution because univariate symmetric VG distribution fails to 
account for the dynamics of cross-correlated time series with asymmetric tails. For exam¬ 
ple, the returns of daily price for stocks in related industries are cross-correlated across 
stocks, serially correlated over time and possibly right or left tailed. We further include 
autoregressive (AR) terms in the means to allow for autocorrelation. This generalised 
model called MSVG AR can describe many essential features of financial time series. 
However this model, as well as its estimation methodologies that can cope with various 
computational difficulties in ML implementation, is still relatively scarce in the literature. 
For the ECM algorithms, these computational difficulties include accurately calculating 
the ML estimates with unbounded likelihood while maintaining computational efficiency. 


To deal with these technical issues and evaluate the performance ECM algorithms, 
we perform three simulation studies. Firstly, as the two ECM type algorithms, namely 
the MCECM and ECME algorithms, have a trade-off between computational efficiency 
and performance, we derive a method we call the hybrid ECM (HECM) to improve the 
efficiency whilst also maintaining good performance. We compare the efficiency of these 
three algorithms through simulated data. Secondly, we propose to bound the density 
within a certain range around the centre should it become unbounded when the shape 
parameter is small and observations are close to the mean. We show that the proposed 
technique improves computational efficiency and performance of the HECM algorithm. 
Lastly, we analyse the performance of the algorithm for different MSVG distributions 
with high or low levels of shape and skew parameters. Results are promising. We then 
apply the HECM estimator to fit the MSVG model to multiple stock market indices. To 
assess the significance of parameter estimates, standard errors are also evaluated using 
Louis’s method (Louis, 1982). Results show that the MSVG AR model provides good 
fit to the data and reveals high correlation between some pairs of indices. We then fit a 
bivariate model to a pair of highly correlated market indices. The contour plot of fitted 
density reveals that the bivariate model captures the high concentration of observations 
in the center and heavy outliers on the edge. Results facilitate portfolio setting based on 
a basket of market indices. 


In summary, our proposed HECM estimator is new in the literature of VG models and 
forms a useful toolkit to its implementation with all important technical issues clearly 
addressed. We also showcase its efficiency and demonstrate its applicability through real 
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examples. Nevertheless the techniques provided in HECM estimation including the way 
to handle unbounded density can be generalised to other distributions with normal mean- 
variance mixtures representation. The structure of the paper is as follows. Section [2] 
introduces the MSVG distribution and some of its key properties. Section [3] presents 
the ECM algorithms for estimating the model parameters and the calculation of their 
standard errors by computing the information matrix through Louis’s formula. We also 
propose some techniques to address the technical issues in the ECM algorithms. Section 
[4] describes the simulation studies to assess the performance of proposed techniques in 
handling these technical issues. Section [5] implements the HECM algorithm to real data. 
Finally, a brief conclusion with discussion is given in Section [ 6 j 


2 Multivariate Skewed Variance Gamma Model 


While univariate VG models has been considered in many times in the literature, MSVG 
models are in fact more applicable because they reveal the dependence between different 
components and allows asymmetric tail behavior. Let yi, i = l,...,n be a set of d- 
dimensional multivariate observations following a MSVG distribution with the probability 
density function (pdf) given by: 


fvc(y ) = 


ii-i/ 


V 2 




2 z/ + 7 / S SKi/-/*)'£ 1 (y - y) ) exp ((y - y)'T, H) 


E| 3 vrfr (u) \{2u + ys-WXy - y)'^~\y - [1 + 


, 2 v—d 
2 


( 1 ) 


where /i £ K“ is the location parameter, X is a d x d positive definite symmetric scale 
matrix, 7 e is the skewness parameter, v > 0 is the shape parameter, T(-) is the 


gamma function and K v 
rj (see ~ 


is the modified Bessel function of the second kind with index 


Gradshteyn and Ryzhik 2007, §9.6). Based on this parametrisation, decreasing the 


shape parameter v will increase the probability around /1 as well as the tail probabilities at 
the expense of probability in the intermediate range. See Fung and Seneta (2007) for more 
information about the shape parameter v and the tail behavior which is of power-modified 
exponential-type. 


The MSVG distribution has a normal mean-variance mixtures representation given 
by: 

ViW ~ + j\i, AjS), A(2) 

where C}(a,(3) is a Gamma distribution with shape parameters a > 0, rate parameter 
f3 > 0 and pdf: 

/g(A) = A Q_ 1 exp(-/3A), for A > 0 . 

r (a) 
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The mean and variance of a MSVG random vector Y t are given by: 


E(^)=/x + 7 and Co v(YJ) = E + £ 77 ', (3) 


respectively. , The pdf in ([!]) as t/j —>• y is given by: 


fvciVi 


2"~ d TT--z |S|"5 ^ 


rM (2i/ + 7 / S _1 7) ”"2 


l srl 7|) ‘og w 

2-^-1 i S 7 r (i-d ' 


rw <?“ 


d—2v 


if v > f, 

if z/ = - 
11 ^ 2 , 

if zx < f, 


where 6? = (y t — y) , 'E^ 1 (y i — y). This shows that the pdf at /x is unbounded when 
zy < |. This is an important property of the MSVG distribution to be addressed in the 
next section. 

Figure [T| gives four pairs of contour and three dimensional plots for various cases of 
the MSVG distribution. The first pair of plots is based on parameters 


y = 



£ 
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v = 3. 


Based on the distribution for the first pair of plots, three other pairs of plots demonstrate 
the changes in pdf when the shape parameter decreases to v = 0 . 6 , the skewness parameter 
increases to 7 = (0.5, 2), and the correlation coefficient in £ increases to 0.8, respectively, 
while keeping other parameters fixed. Note that the density for the case with small v is 
unbounded. 


3 ECM Algorithm 

In the likelihood approach, maximising the complicated density in (JTj) can be computa¬ 
tionally intensive. We propose the ECM algorithm in the likelihood approach because 
the normal mean-variance mixtures representation in (J2| can greatly simplify the max¬ 
imisation procedures in the M-step of the algorithm. Then we extend the algorithm to 
incorporate an AR(1) term and discuss some technical issues involving small v. 
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(a) standard contour plot (b) small v contour plot 



(c) standard 3D plot 


(d) small v 3D plot 



(e) large 7 contour plot (f) large correlation contour plot 



(g) large 7 3D plot 


(h) large correlation 3D plot 


Figure 1: Various contour and 3D plots of bivariate skewed VG model with for different 
parameters. In the contour plots, the bold lines represent level sets {0.1, 0.2, 0.3, 0.4}, and 
the dashed lines represent level sets {0.05,0.15,0.25,0.35}. The range for the 3D plots is 
kept between 0 and 0.4 
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3.1 Basic algorithm 


The ML estimates for parameters 0 = (/z, E, 7 , v) in the parameter space © maximises 
the log-likelihood 

n 

t(0\y 1 ,...,y n ) = ^log f VG ( yi \0) 

i=i 

where fvc(') is fl ie pdf of MSVG distribution in ([!]). Using the normal mean-variance 
mixtures representation of the MSVG distribution in ([2]), we consider A = {Ai,A n } to 
be unobserved and {j/, A} to be the complete data where y = (yi ,..., y n ). The complete 
data density can be represented as a product of conditional normal densities given the 
unobserved mixing parameters and the gamma densities of the mixing parameters, that 
is, 

n 

f(y, A|0) = Y[f N {yi\\,v,'2,'Y)fG(Wv)- 

i= 1 

This is essentially a state space model with normals as the structural distributions, A* 
as the state or missing parameters, and gamma as the mixing distribution. Due to the 
mixing structure, the complete-data log-likelihood function can be factorised into two 
distinct functions as follows: 

£(6\y,\) = Uv(/x, S,7|y, A) + i G [y |A) (4) 

where the log-likelihood of the conditional normal distributions is given by: 

n i n i 

^Jv(/i,S,7|j/,A) = -|log|S| - - ^ —(yi - n - X^yE^iyi - y, - Xa) + C L (5) 

i=1 1 

for some constant C\ and the log-likelihood of the conditional gamma distributions is 
given by: 


£ g 0 |A) = nv log v — n logT(i/) + (u - 1 ) y^logA^ - ( 6 ) 

i= 1 i=l 

Hence, condition on A, the estimation of all parameters can be separated in 

two blocks: the conditional maximisation (CM) of (/x, S, 7 ) from the conditional normals 
log-likelihood function and the CM of u from the gamma log-likelihood function. The 
mixing parameter A is first estimated by the conditional expectation given the observed- 
data y. The procedures are described as below: 

E-step for Ap 

To derive the conditional expectations of the Aj’s given the observed-data yi and param- 
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eters (/z, X, 7 , z/), we need the conditional posterior distribution of A* given by: 


/(Ai|t/i,/z,X, 7 ,z/) oc /(Aj,?/ ; :|/z,X, 7 ,z/) 


oc A 


y-S-l 


exp 


X 7 + 2z/) 


( 7 ) 


which corresponds to the pdf of a generalised inverse Gaussian distribution (Embrechts 


1983). Using this posterior distribution, we can calculate the following conditional expec¬ 


tations: 


K = E(Ai|y,0) = 


S t K v l+l ( V / 2z/ + 7'X- 1 7^) 


1/Aj — E ( — 


y,o = 


\j2v + 7 / X~ 1 7 K v _d ^a/2z/ + 7 , X- 1 7^ 
a/ 2 z^ + 7 , X- 1 7 K v d } ( a/ 2 z/ + 7 / S 1 7 (h 


'( 1 . 0 ) 


log A* = E (log Aj| y, 0) = log 


Si 


+ 


K^ii^v + yn-^Si) 


( 8 ) 


(9) 


( 10 ) 


a/2zz + 7 / X ~ 1 7 y K v _d(y/2v + 7'X- 1 75 i ) 

where K^ ,0 \z) = J^K a (z) | which is approximated using the central difference formula 




K u+h (z) - K„ h (z 


v—h \ 


2 h 


( 11 ) 


where we let h — 10 


— 1 n-5 


CM-step for (/z, X, 7): 

Given the mixing parameters A, the ML estimate of (/z, X, 7 ) can be obtained by max¬ 
imising Gv(Ab X, 7 I 2 /, A) in (J5]) with respect to (/z, X, 7 ). After equating each component 
of the partial derivatives of £n(/x, X, 7 1zx, y , A) to zero, we obtain the following estimates: 


„ _Sy/\S\ - nSy 

/I o O 9 5 

hi/Ah A - n 2 

„ S v — nix 
7 =- fL ^-, 

*->A 

A 1 „ n 1 1 

s =- v^ Vi ~ ~ — 'yi'Sx, 

n 2 —' A* n 


where the complete data sufficient statistics are 


n 

Sy = 'y ^ yi , 'S'y/A 

i=1 



Sa = X) 

i= 1 


Sl/A 



( 12 ) 

(13) 

(14) 


(15) 

























CM-step for v: 

Given the mixing parameters A, the ML estimate of v can be obtained by numerically 
maximising (,q{v\X) in (| 6 ]) with respect to v using Newton-Raphson (NR) algorithm where 
the derivatives is given by: 


di 

dv 

dH 
dv 2 


n + n log v — mj){y) + S iog A - S a, 
— — nv'(v). 


(16) 

(17) 


where ^(x) = ^logr(a:) is the digamma function and 5'i og a = ^“^logA*. This ECM 
algorithm is the MCECM algorithm. In summary, the algorithm involves the following 
steps: 

Initialisation step: Choose suitable starting values (/x 0 , E 0 , 7 o> ^o) f° r the parameters. 
If the data is roughly symmetric around the mean, then it is recommended to choose 
starting values ( y , Q y: 0, d ) where y and Q y denote the sample mean and sample variance- 
covariance matrix of y respectively. 


At the f-th iteration with current estimates (p (t \ S^, 7 ^, zA^), 

E-step 1: Calculate A) ’ and 1/A* for i = 1 in (J8J) and (|9]), respectively, 

using (/x^,£^, 7 ^),z/W). Calculate also the sufficient statistics S^^ 2 \ S^ +1 ^ 2 ' 1 and 

d‘r /2) in en- 

CM-step 1: Update the estimates to (/xh+T, 5 ]h+ 1 ),'y(*+ 1 )) i n (12) to (14) respectively 
using the sufficient statistics. 

-(t+i) -—-(t+i) — 

E-step 2: Calculate A; andlogAj for i = 1,in (k>p and (10) respectively using 


the updated estimates (/A m \ £ (t+1 \ 7 ^ +1 ), v^). Calculate also the sufficient statistics 
d‘ +1> and in 

CM-step 2: Update the estimate to zA t+1 ) using the derivatives in (16) and (17) for the 
NR procedure and the sufficient statistics. 


Stopping rule: Repeat the procedures until the relative increment of log-likelihood 
function is smaller than the tolerance level 10 ” 8 . 


Alternatively, the ECME algorithm update the estimate to zA t+1 ) in CM-step 2 by 
maximising directly the actual log-likelihood ivGi^y, R, S, 7 ) which is the sum of log 
pdfs in Q for (yi ,..., y n ). This procedure is implemented using the optimize function 
specified in R which is a combination of golden section search and successive parabolic 
interpolation. 
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3.2 Algorithm with AR term 

Suppose now our observed-data is y = ( y 0 , ...,y n ). The MSVG distribution for y t where 
2 = 1, ...,n with AR mean function of order 1 (without loss of generosity) can be repre¬ 
sented hierarchically as follows: 

yi\Xi ~ J\f d {f3 0 + /3ij/i_i + 7 Ai, A*E), A i~G(u,v) (18) 

where (3q e M. d , and Pi is a d x d matrix with all eigenvalues less than 1 in modulus. 
Because of the dependency structure, we maximise the conditional likelihood function for 
y 1 ,...,y n given y Q expressed by: 

n 

f(y i, •••> Vn , Ai,A„| j/ 0 ;0) = JJ J/i- 1 ; e )fc(K-, *0- 

2=1 

So the conditional log-likelihood is given by: 

4(0|y, A) = MA), /3i, S, 7|y, A) + Ig{v\\) 

where the log-likelihood of the conditional normal distribution is given by: 

MA),/3i,£,7|y, A) 

n 1 . n , 1 

= log l S l - 2 ~\( yi ~ ~ -Po- PiVi-i - Ai7) + C 2 

i= 1 ( 

(19) 

for some constant C 2 and £g(i/|A) is given in equation Q. 


E-step for A*: 

This step is similar to the E-step in Section 3.1 Just replace Sf = (y t — ft)' E ' 1 (y, 
with Sf = ( y t -fa- PiVi-i)' S' 1 (y* - Pa - PiVi-i)- 


M) 


CM-step for (/3 0 , Pi, E, 7 ): 

Similarly condition on A*, the ML estimates of (/3o, Pi, E, 7 ) can be obtained by maximis¬ 
ing £n(Po, Pi: E, 7 1 j/. A) in (19) with respect to (/3 0 , /3i, E, 7 ). This gives us a close form 
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solution as follows: 


(Po\ 

I Pi 

\77 


£ 


Si/ A 

or 

°cc/A 

n \ 

— _L 

/ or 

1 

\ 

Sx/ A 

Sxx'/\ 

S x 


Sxy' /X 

n 

S'x 

Sx) 


K Sy 

/ 


1 . n ^ 1 1 

Po- Piyi-i){yi -Po- PiVi-iY - 

\ n 


n z —' A 

1=1 


( 20 ) 

( 21 ) 


where (20) as a generalisation of (12) and (13) can be further generalised to include higher 


order AR terms in the mean function. Then the complete data sufficient statistics are 


given by (15) with the addition of 


Sx ^ ' Vi 1 ) S x j\ ^ ^ ^ Ui—li S- 


Z— 1 


i=l 


^xx '/A 


^ A 

i=i 


'Vi—XVi—Xi S X y'/\ 


^ X. 

i =1 ‘ 


■Vi-iVi- 


( 22 ) 


CM-step for in 

This step is the same as in Section 3.1 


In summary, the layout of the MCECM and ECME algorithms are similar to those 


for the MSVG model in Section |3.1| except that we update /3 0 and /3i instead of p and 
adjust the E-step accordingly. 
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3.3 Adjustment to ECM algorithm 
3.3.1 Adjustment for small v 

Numerical problems may occur when dealing with v < | since the pdf fvciVi) i n (Jl]) at 
y is unbounded for such u. To handle such case, we can show that as —> y, 


E(Aj| yi,0) 


2u—d 


2^+7 , S- 1 7 


1 


( 2 i/+ 7 , £- 1 7 ) log (Si 

r( ^~2 +l) 2^-d+i pi/ + ys-y) 


o ~ v ~ 1 xd- 2v 


if V > f, 

if v = i. 
11 ^ 2’ 

if v< f, 





' 2^+7 , £~ 1 7 
2is—d—2 


- { 2 u + yx A) log h, 


r(i-i/+ 2 ) ol _ 2;y+d 

r +-f) 


(2z/ + yx y) 


^ 2 x.2v—d—2 


S'f log (5i 
d—2v 


if i/ > | + 1 , 
if i/ = | + 1 , 

if ^ e (|, | + l), 


if z/= 2 , 


if v < 


E(log Ai|yi,0) 


vy-f) 

log <5; 

2 log Si 


- log 


2k+7 , £~ 1 7 

2 


if V > f, 
if Z/= 

if ^ < I- 


So for z/ < | + 1 and v < |, E(^-|5 i; 7 , u) and E(log Aj|<5j, 7 , zz) are unbounded for data 
points at the mean /j, respectively. As these expected values are numerically unstable to 
calculate, we propose to bound the pdf around y by a bound such that if 


diV 2u + 7 ' Y. 1 7 < A, 


(23) 


where A is a small fixed constant, then we compute the expected values E( 4 -|(!>*, 7 , u) and 
E(log Aj|y, 7 , u) in Q and (10) with 5* = A/ y"2z/ + 7 'X~y replacing h,. The region 


in (23) will be called the delta region. We will analyse the optimal choices of A in a 


simulation study in the next section. 

Another problem may occur when estimating X. Suppose that yt —s- y^> for some 
-(t+ 1 / 2 ) 

z, then 1/Aj —> 00 . Now consider the CM-step for (y, X, 7 ), /W+ 1 ) needs to be 

calculated before calculating X( t+1 h For the z th term of the first summation for calculating 
X' t+1) in (14), we get that 


1 /a/ + 7 \yi - y {t+1) )(yi 


y (t+l ))' —>• 00 , 


12 






















which leads to Xd+b diverging to infinity since (y t — /A f+ b) does not converge to zero. 

—(£+ 3 / 4 ) — (t_ |_ 3 / 4 ') 

To solve this problem, we insert an extra E-step to update 1/A * and A) ' ; after 

updating (/A t+ b ) 'y(*+ 1 + This adjustment makes the i th term of the first summation in 


X^+b to be 


1/A/ t+3/ % - A {t+1) ){Vi - A {t+1) )' c 


for some finite constant matrix C resulting in a more numerically stable estimate for 
Sd+b. Thus adding an additional E-step to update 1/Aj after updating /id+b an d 
-y d +1 ) before updating X (4 +b improves the numerical stability and convergence of the 
algorithm. 


3.3.2 Hybrid ECM algorithm 


Meng and Rubin (1993), Liu and Rubin (1994) together showed that the MCECM and 


ECME algorithm always increase the log-likelihood monotonically. However, problems 
might occur when dealing with multimodel log-likelihood such as the MSVG log-likelihood 


when v < 


In the MCECM algorithm, v is estimated by numerically maximising the log-likelihood 


of the conditional gamma mixing density using NR algorithm with derivatives (16) and 


0- The ECME algorithm which numerically maximises the actual log-likelihood of the 
MSVG distribution with pdf ({T]) , is computationally less efficient than the MCECM algo¬ 
rithm despite less iterations are required for convergence. However, the ECME algorithm 
is more stable than the MCECM algorithm since the CM-step for v maximises the actual 
log-likelihood, hence avoiding the missing data log \ in CM-step 2. 

In light of this, we propose an algorithm which combines the efficiency of the MCECM 
algorithm while maintaining the performance of the ECME algorithm. We call this the 
hybrid ECM (HECM) algorithm. Essentially it starts with the MCECM algorithm and 
repeats until either the relative increment of log-likelihood is smaller than 10 -8 which stops 
the algorithm, reverts back to the previous estimates and performs the ECME algorithm. 

For the univariate skew VG model, the performance of the HECM algorithm can be 


improved by replacing the estimate for jl in (12) with the estimate for /i such that it 


maximizes the actual log-likelihood ivcihlv, c 2 , 7, u )- 

The two adjustments using density bound and extra E-step as well as the hybrid 
procedures balance computational efficiency, and accuracy of the ML estimation. 


3.4 Observed Information Matrix 


Letting Y c = (y, A) be the complete data, Louis (JI982) expressed the observed information 
matrix I 0 (Q) in terms of the conditional expectation of the derivatives of complete data 
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log-likelihood £'(6\Y C ) and £"(6\Y C ) with respect to conditional distribution \\y which 
depends on 6. 


I o (0 ) = -E e [£"(0\Y c )\ -E e [£'(6\Y c )£’ T (d\Y c )] +E 0 [£\d\Y c )}E e [£'(d\Y c )] 


(24) 


In the context of MSVG model, £(0\Y C ) is given by ([I]). The hrst derivatives of complete 
data log-likelihood can be expressed as 


2 =1 


d£ 


N 


<9vec/3i 

8£n 


= vec 


s 1 Y) T-(yi- A) - PiVi-i - 1 


2=1 


avechS = Vedl _ 7 ) ° vech ( 1 + 1 S vy'l a S 1 


<91? 


w 


<97 

<9z/ 


= n 


S 1 - A) - - Ai7) 

2=1 

n n 

+ n log z/ — m^{u) + log A* — A* 


2=1 


2=1 


where vec(-) and vech(-) are the vectorisation and half-vectorisation operators respectively, 
1 pxp is the p x p matrix of l’s, I be a conformable identity matrix, and 

n 1 

Syy'/X = ^2 y( y i ~ @0 ~ PlUi-1 - A*7 )(Vi - A - PlVi-1 ~ Kl)'■ 


2=1 


Calculating the conditional expectations in equation (24) requires the second order 


derivatives which can be obtained using vector differentiation, and the following condi¬ 
tional expectations 


Ee (A?) 
E e (A* log A,) 

((log A,) 2 ) 


07 K u _d (<5*0) 

k 


o 7 (^i0) u ~2 +k 


K 


( 1 , 0 ) 


(<5i0) + lo g ( ) K v-i+k (&0) 


log if 


2 K < 2 J°i (W) + 2 log (!) y°l (W) 


+ 


(<5*0) 


where we let 0 = \/t / E' _ 1 7 + 2z/ and Iil 2 ’°\z ) = ^A" a (z)| a=jy which is approximated 
using central difference formula. 

The asymptotic covariance matrix of G can be approximated by the inverse of the 
observed information matrix I o (0). This gives us a way to approximate the standard 
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error of 9i = (6)i 


SE d 


W) 


-l 


(25) 


4 Simulation Study 


We perform three simulation studies: firstly, to compare the efficiency of the MCECM, 
ECME and HECM algorithms with the extra E-step mentioned in Section |3.3.1 secondly, 
to analyse the optimal choices of A on increasing the dimension of MSVG distribution; 
and lastly, to assess the performance of HECM algorithm for different levels of shape 
and skewness parameters. We use the statistical program called R to generate r random 
samples from the MSVG distribution each of sample size n = 1000 and estimate the 
parameters for each sample. We set the true parameters to be 


V = 


X = 



7 = 



v = 3 


for the bivariate base model and use initial values recommended in Section 3.1 that is 


(Ho, So, 7o, v o) — {V: Q y , 0, 2 ). To improve convergence, we multiply the data by C = 100 . 
Thus we have the scaled parameters as Cfi, C 2 X and C 7 for / 4 , X and 7 respectively. The 
results are then rescaled back and for each random sample, we calculate the log-likelihood 
to assess the model fit. 


4.1 Efficiency across algorithms 

To illustrate that the three algorithms, MCECM, ECME, and HECM provide good es¬ 
timates, we perform the simulation above with r = 1000 replications using the bivariate 
base model and compare their relative performance. In Table [l] below, the average of pa¬ 
rameter estimates, log-likelihoods, number of iterations for convergence over replications, 
and the total computational times are presented. For each algorithm, we generate the 
same set of simulated data using the same seed to eliminate the effect of sampling errors 
in the comparison. 
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Table 1: Parameter estimates and model efficiency measures across algorithms 



true 

MCECM 

ECME 

HECM 

ill 

0 

-0.0072 

-0.0072 

-0.0072 

^ 2 

0 

-0.0069 

-0.0069 

-0.0069 

Si.1 

1 

0.9959 

0.9959 

0.9959 

El,2 

0.4 

0.3973 

0.3973 

0.3973 

^2,2 

1 

0.9914 

0.9914 

0.9914 

7i 

0.2 

0.2067 

0.2067 

0.2068 

72 

0.3 

0.3062 

0.3062 

0.3062 

V 

2.5 

2.5699 

2.5709 

2.5710 

loglik 


-2713.41 

-2713.41 

-2713.41 

conv.iter 


75.6 

31.8 

76.6 

time (hr) 


8.1 

16.4 

9.1 


All estimates obtained are reasonably close to the true value which supports the valid¬ 
ity of these algorithms. As expected, MCECM algorithm is the most efficient in terms of 
computation time whereas ECME algorithm converges the fastest in terms of the number 
of iterations. However the ECME and HECM algorithms provide slightly better model 
fits when comparing more digits of the averaged log-likelihood. Overall HECM algorithm 
is preferred as it provides better model fit than the MCECM algorithm and runs faster 
than the ECME algorithm. Hence HECM algorithm will be adopted in all subsequent 
analyses. 


4.2 Efficiency across A for small shape parameter 

In this simulation study, we analyse the behavior of estimates for different levels of A 
as defined in Section |3.3.1| We begin the simulation study with bivariate skewed VG 
distribution with parameters as in Table [l] but with v = 0.6 as the true shape parameter. 
The shape parameter is chosen to satisfy the condition v < | for unbounded density. We 
repeat the experiment for different levels of A but we do not report average log-likelihood 
because it can be estimated to be as large as possible by reducing the value of A. For 
each A level, we repeat the estimation procedure r = 1000 times. Generally from the 
results of the experiment as shown in Tabic [2j the further away the A is from the range 
(le-5,le-3), the further away are the estimates from their true values. Hence, an optimal 
range of A is from le-5 to le-3. 

This experiment is repeated for dimension d = 3 with u — 1. The results for these 
experiments are given in Tables [3] 
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Table 2: Parameter estimates across levels of A for skewed bivariate VG model. 




A levels 


true 

le-300 

le-100 

le-50 

le-10 

le-7 

le-5 

le-4 

le-3 

hi 

0 

0.0040 

0.0040 

0.0040 

0.0040 

0.0040 

0.0040 

0.0040 

0.0039 

h2 

0 

0.0058 

0.0058 

0.0058 

0.0058 

0.0058 

0.0058 

0.0058 

0.0056 

£ 1,1 

1 

1.2118 

1.0600 

1.0434 

1.0111 

1.0043 

0.9997 

0.9961 

0.9916 

Sl,2 

0.4 

0.4880 

0.4251 

0.4184 

0.4054 

0.4027 

0.4009 

0.3994 

0.3976 

^2,2 

1 

1.2149 

1.0607 

1.0438 

1.0116 

1.0049 

1.0003 

0.9967 

0.9921 

7l 

0.2 

0.1939 

0.1910 

0.1932 

0.1952 

0.1952 

0.1953 

0.1951 

0.1950 

72 

0.3 

0.2904 

0.2861 

0.2894 

0.2925 

0.2925 

0.2925 

0.2923 

0.2921 

V 

0.6 

0.3300 

0.4421 

0.4964 

0.5771 

0.5885 

0.5965 

0.6010 

0.6053 


Estimate in bold is closest to the true value. 


Table 3: Parameter estimates across levels of A for skewed trivariate VG model. 




A levels 


true 

le-7 

le-6 

le-5 

le-4 

le-3 

le-2 

le-1 

hi 

0 

-0.0019 

-0.0019 

-0.0019 

-0.0019 

-0.0019 

-0.0018 

0.0013 

h 2 

0 

-0.0024 

-0.0024 

-0.0024 

-0.0024 

-0.0024 

-0.0022 

-0.0020 

h3 

0 

-0.0012 

-0.0012 

-0.0012 

-0.0012 

-0.0012 

-0.0011 

-0.0015 

£l,l 

1 

1.0092 

1.0070 

1.0049 

1.0026 

0.9989 

0.9952 

0.9889 

Si,2 

0.4 

0.4020 

0.4012 

0.4003 

0.3994 

0.3980 

0.3966 

0.3942 

Si ,3 

0.3 

0.3019 

0.3013 

0.3007 

0.3000 

0.2989 

0.2978 

0.2960 

S 27 

1 

1.0094 

1.0073 

1.0051 

1.0029 

0.9992 

0.9956 

0.9893 

^2,3 

0.2 

0.2007 

0.2003 

0.1999 

0.1995 

0.1988 

0.1981 

0.1969 

S 3 ,3 

1 

1.0128 

1.0107 

1.0086 

1.0064 

1.0027 

0.9991 

0.9926 

7l 

0.2 

0.2012 

0.2012 

0.2012 

0.2012 

0.2010 

0.2007 

0.1999 

72 

0.3 

0.3026 

0.3026 

0.3026 

0.3026 

0.3025 

0.3019 

0.3013 

73 

0.4 

0.4018 

0.4018 

0.4018 

0.4018 

0.4016 

0.4010 

0.4009 

V 

1 

0.9512 

0.9602 

0.9696 

0.9795 

0.9912 

1.0013 

1.0225 


Estimate in bold is closest to the true value. 


On increasing the dimension, the optimal ranges for A is (le-5,le-l). As for the sub¬ 
sequent simulation experiment, we implement A within the optimal ranges for dimension 
2 to 3. As results show that the optimal A range only increases slowly with d, we apply 
the optimum A range for d — 3 to the five dimensional MSVG model in the real data 
analysis. 

For the univariate case, we use the modified HECM algorithm as mentioned in Section 


possible to directly compare the algorithm for high dimensional cases. As the A level gets 
bigger, a 2 and 7 increases towards the true value. However, v increases away from the 


3.3.2 The results are presented below in Table Hi Due to the modification, it is not 
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true value. Since the optimal A range is not clear for the univariate case, we suggest any 
A in the range (le-10,le-3). 

Table 4: Parameter estimates across levels of A for symmetric univariate VG model 



true 

A levels 

le-300 

le -100 

le-50 

le -20 

le -10 

le-5 

le-3 

b 

0 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

0.0000 

a 2 

1 

0.8564 

0.8564 

0.8564 

0.8564 

0.8663 

0.8862 

0.9148 

7 

0.2 

0.1878 

0.1878 

0.1878 

0.1878 

0.1885 

0.1907 

0.1952 

V 

0.3 

0.3102 

0.3102 

0.3102 

0.3102 

0.3111 

0.3137 

0.3267 


4.3 Effects of shape and skewness parameters 

In this simulation study, we study the performance of HECM algorithm for various levels 
of skewness parameter 7 when the shape parameter is v = 3 and v = 0.6 which indicates 
large and small value respectively according to the condition v < We consider the same 
bivariate base model as described at the start of Section [4] except that the true skewness 
parameters are 7 = (0.2,0.2), (0.5,0.5), (0.7,0.7), (1,1), (2,2) and (0.5,2). We report the 
averaged iteration in which the algorithm switches from MCECM to ECME algorithm as 
denoted by switch iter, the averaged iteration when the algorithm converges by conv iter, 
and the total time it takes to run the algorithm. 


Table 5: Parameter estimates and model efficiency measures for v = 3 



true 

True ( 71 , 72 ) 

( 0 . 2 , 0 . 2 ) 

(0.5, 0.5) 

(0.7, 0.7) 

(U) 

( 2 , 2 ) 

(0.5,2) 

b 1 

0 

-0.0081 

-0.0129 

-0.0149 

-0.0176 

-0.0246 

-0.0071 

b 2 

0 

-0.0095 

-0.0143 

-0.0162 

-0.0186 

-0.0244 

-0.0292 

£ 1,1 

1 

0.9960 

0.9942 

0.9927 

0.9903 

0.9779 

0.9951 

£ 1,2 

0.4 

0.3985 

0.3963 

0.3947 

0.3921 

0.3790 

0.3932 

£ 2,2 

1 

0.9952 

0.9931 

0.9915 

0.9888 

0.9759 

0.9727 

7i 


0.2061 

0.5108 

0.7128 

1.0154 

2.0221 

0.5050 

72 


0.2084 

0.5131 

0.7150 

1.0173 

2.0229 

2.0277 

V 

3 

3.1006 

3.0856 

3.0785 

3.0699 

3.0477 

3.0535 

switch, iter 


99.31 

105.34 

111.89 

125.73 

241.16 

192.61 

conv. iter 


99.31 

105.34 

111.89 

125.73 

241.16 

192.61 

time (hr) 


11.5 

12.1 

12.3 

22.8 

41.2 

34.2 


Table [5] shows that all estimates are very close to their true values. On increasing the 
level of skewness, the biases of parameters [i, 7 and £ tend to increase gradually while 
the bias for v decreases gradually. The number of iterations to switch and converge and 
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hence the computation time also increase accordingly. This also applies for the unequal 
skewness case. Moreover, there is no reliance of ECME algorithm as the parameter v is 
chosen so that v > d/2. Indeed the simulation with larger skewness needs more iterations 
for the algorithm to converge. The results when v = 0.6 is presented in Table [6| 


Table 6: Parameter estimates and model efficiency measures for v = 0.6 



true 

True ( 71 , 72 ) 

(0.2, 0.2) 

(0.5, 0.5) 

(0.7,0.7) 

(1,1) 

(2.2) 

(0.5,2) 

111 

0 

0.0043 

0.0083 

0.0108 

0.0136 

0.0212 

0.0052 

^ 2 

0 

0.0042 

0.0082 

0.0110 

0.0138 

0.0213 

0.0213 

Si.1 

1 

0.9995 

1.0054 

1.0114 

1.0220 

1.0573 

1.0175 

Si, 2 

0.4 

0.4006 

0.4061 

0.4124 

0.4226 

0.4391 

0.4167 

S2,2 

1 

0.9990 

1.0044 

1.0111 

1.0217 

1.0540 

1.0558 

7i 


0.1950 

0.4904 

0.6876 

0.9845 

2.0337 

0.5007 

72 


0.1943 

0.4897 

0.6866 

0.9835 

2.0328 

2.0028 

V 

0.6 

0.5968 

0.5961 

0.5957 

0.5945 

0.5946 

0.5940 

switch, iter 


26.8 

27.6 

28.1 

23.7 

23.2 

26.5 

co nv. iter 


28.4 

28.0 

28.5 

24.5 

23.2 

26.5 

time (hr) 


4.9 

3.5 

3.7 

6.7 

5.8 

6.6 


The parameter estimates using HECM algorithm and optimal A to deal with un¬ 
bounded density are reasonably accurate. This justifies the choice of A for d = 2. Sur¬ 
prisingly the algorithm roughly needs around the same number of iterations for each 
skewness level. In comparison with the results in Table [5] for larger u, the algorithm 
requires significantly less iterations. Eventually, the HECM algorithm switch to ECME 
to obtain parameter estimates. This justifies our proposed HECM algorithm instead of 
relying solely on MCECM. 

In summary, the proposed HECM algorithm gives good parameter estimates for the 
MSVG distribution even when its shape parameter is small and skewness is heavy. 


5 Real Data Analysis 

To illustrate the applicability of HECM algorithm using the MSVG AR model, we con¬ 
sider the returns of five daily closing price indices, namely, Deutscher Aktien (DAX), 
Standard & Poors 500 (S&P 500), Financial Times Stock Exchange 100 (FTSE 100), All 
Ordinaries (AORD) and Cotation Assiste en Continu 40 (CAC) from 1 st January 2004 to 
31 st December 2012 with tolerance level of 10~ 10 . After filtering the data with missing 
closing prices, we obtain the data size of n = 2188. Plots of the five time series are given 
in Figure [2] As the summary statistics in Table [7] show that the data exhibit considerable 
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skewness and kurtosis, we begin our analyses with the MSVG model adopting a constant 
mean. To assess the model fit, we use the Akaike information criterion with a correction 
for finite sample size (AICc) given by: 


AICc 


AIC + 


2 k(k + 1) 
n — k — 1 


where k denotes the number of model parameters. Model with the lowest AICc value is 
preferred as it demonstrates the best model fit after accounting for model complicity. 




Figure 2: Time series plots for the five daily return series 


Table 7: Summary statistics for the five daily return series 



DAX 

S&P 500 

FTSE 100 

AORD 

CAC 

meant 

0.2901 

0.1096 

0.1223 

0.1587 

0.1755 

sd 

0.0146 

0.0136 

0.0127 

0.0114 

0.0287 

max 

0.1080 

0.1096 

0.0938 

0.0540 

0.1778 

min 

-0.0774 

-0.0947 

-0.0926 

-0.1049 

-0.2133 

skewness 

0.0208 

-0.2939 

-0.0946 

-0.7291 

0.1524 

kurtosis 

9.4529 

12.9395 

11.0138 

10.4846 

11.7483 


f The reported means are multiplied by 1000. 


The results for the estimated parameters and their standard errors are given in Table 

0 . 
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Tabic 8: Results for the Multivariate Skewed VG mode 



estimate 

standard error 

T 

R 

10 -4 (18 10 12 20 -7) 

10” 4 (3 3 3 3 6) 

£ 

10~ 5 

(19 10 14 5 12 \ 

14 8 2 17 

13 4 9 

12 0.4 

y 66 y 


10~ 6 

/7 5 6 4 9 \ 

5 4 3 9 

5 3 7 

5 7 

V 26^ 


T 

7 

10 -4 ( — 15 -9 -11 -19 9) 

1— 1 

o 

1 

V 

1.40 

0.054 

Corr 


/I 0.60 0.86 0.31 0.33^ 

1 0.58 0.13 0.57 

1 0.35 0.29 

1 0.01 

V 1 ) 



AICc 
conv iter 
time (sec) 

-69422 

41 

91 



The model is then extended to include an AR(1) term in the mean to describe the 
autocorrelation of the five return series. Similarly, results for the estimated parameters 
and their standard errors are given in Table [9j 


Table 9: Results for the Multivariate Skewed VG model with AR(1) term 



estimate 

standard error 

ft 

10~ 4 (16 13 10 14 2) 

10 -4 (3 3 3 2 6) 

Pi 

10" 2 

/ —8 35 -18 -1 2 \ 

8 -11 -4 -7 -1 

-14 37 -13 -1 -0.4 

-4 43 24 -21 -2 

\ 10 -27 1 -9-5 j 


10" 2 

/4 3 5 3 1\ 

3 3 4 2 1 

3 3 3 2 1 

2 2 3 2 1 

\6 6 8 5 2/ 


S 

10" 5 

/17 10 12 3 13\ 
14 8 2 17 

12 3 10 

7 2 

V 64 / 


10~ 6 

^7 5 5 3 9 ^ 

5 4 2 8 

5 2 7 

3 5 

V 25y 


T 

7 

10“ 4 ( — 13 -12 -8 -12 -0.2) 

10- 4 (4 4 4 3 8) 

V 

1.45 

0.057 

Corr 


(\ 0.65 0.85 0.29 0.38' 

1 0.64 0.22 0.56 

1 0.33 0.35 

1 0.09 

V 1 



AICc 
conv iter 
time (sec) 

-70866 

50 

111 
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The two sets of model parameters are not directly comparable. However the estimates 
of S and correlation matrices are very similar while 7 are some what similar. The standard 
error for the estimate f3\ in MSVG AR model suggests that only the second column has 
all parameters being significant. Since the second column of the (3\ matrix has relatively 
larger values, all five stocks are strongly lag-one cross-correlated with S&P 500. It is 
not surprising to know that the returns in S&P 500 has the most impact on each of the 
five returns the next day because S&P 500 has been shown to be a strong predictor for 
a number of market indices. This is due to its large market share and its minimal real 
time difference with lag-one return of other markets. Comparing the AICc of the two 
models, the MSVG AR model provides better model fit which is further illustrated in the 
histogram of residuals in Figure [3] after filtering out the AR(1) term. Fitted marginal pdfs 
with /3 0 as the mean of the MSVG model are added to the figure to facilitate comparison. 
Overall the two MSVG models provide good fits to the data despite occasionally, the peaks 
of the histogram and fitted pdf around the means does not match exactly possibly due 
to the rather strong assumption of a common shape parameter v across all components. 
This is a limitation of the MSVG distribution. 




(d) AORD (e) CAC 


Figure 3: Histograms of AR(1) filtered residuals for DAX, S&P 500, FTSE 100, AORD, 
CAC data sets 

From the correlation matrices of the two models, the pair of market indices DAX and 
FTSE 100 are highly correlation (p 13 ~ 0.85), possibly due to the strong competitiveness 
of the German and UK markets as they are the major stock markets in Europe. This 
gives rise to similar cross-correlation for both DAX and FTSE 100 with the other stocks. 
The model fit of the MSVG AR model for DAX and FTSE 100 bivariate return data is 
displayed in the contour plot in Figure [4} Since the parameter estimates are quite similar 
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to the five dimensional empirical study, we omit reporting these results. Overall the model 
captures the strong correlation between the two stocks, giving a good overall fit to the 
DAX and FTSE 100 data set. 



Figure 4: Fitted contour plot of AR(1) filtered DAX and FTSE 100 data sets 


6 Conclusion 


The MCECM and ECME algorithms have been proposed to obtain the ML estimates of 


multivariate Student-t distribution (Liu and Rubin, 1995). We advance the algorithms to 
HECM algorithm and demonstrate its applicability to the MSVG model with AR mean 
function. Our proposed HECM algorithm with two adjustments solves three technical 
issues in application. 


Firstly, HECM algorithm improves the overall computational efficiency by combining 
the speed of MCECM algorithm at the start of iterations with the stability of ECME algo¬ 
rithm at latter iterations. Secondly, when the shape parameter is small and observations 
cluster closely to the mean, the problem of unbounded MSVG density leading to diverged 
estimates 1/A * and log A* can be resolved by capping the density within certain range of 
A. We study the optimal choices of A ranges for dimension d — 2, 3. More studies are 
needed to verify the choices of A for higher dimensions. Lastly, we add an extra E-step 
after updating (/x^ +1 \ 'yh+ 1 )) but before updating X (4+1 ) to improve the stability of the 
S estimate when the MSVG density is again unbounded. 


The HECM algorithm is then applied to fit the MSVG model with an AR(1) mean 
structure to describe the dynamics in financial time series. To improve the model flexibility 
and predictability, the HECM algorithm can be easily extended to models with AR(p) 
terms. Moreover, the algorithm can be further enhanced to popular financial time series 


models such as GARCH models (Bollerslev, 1986) and with leverage effect (Engle and Ng 
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1993). Some distribution families such as the multivariate skew Student-t distribution or 


multivariate skew generalised hyperbolic distribution which nests the MSVG distribution 
may be considered because they can be expressed in normal mean-variance mixtures 
representation. While these extensions improve the applicability of the algorithm, it is 
very challenging as there is no close-form solution for parameters in the mean function 
and hence more layers of iterations are required. 

In summary, we show that the HECM algorithm improves the computation efficiency 
in the ML estimation for the complicated MSVG distribution in normal mean-variance 
mixtures representation. However, we also remark the limitation of the MSVG model 
with one shape parameter for all components. In practice, different time series may follow 
distributions with different shape parameters. Hence, to improve the model applicability, 
it is necessary to allow one shape parameter for each component resulting in a modified 


MSVG model, similar to the GARCH models in Choy et al. (2014) 
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